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1. Introduction 

The dynamics of recurrent epidemics are an active topic of research in a wide range 
of disciphnes: from ecology and epidemiology to applied mathematics and theoretical 
physics P-[3]. Much of the reason for this can be attributed to the large amount of 
historical data available |H|5] , and the rich variety of highly oscillatory patterns that are 
observed in both the data and relatively simple stochastic models [6l[7]. 

The systematic formulation of individually-based models of epidemics starting 
from a master equation (continuous-time Markov chain) has permitted a parallel 
analytical and numerical study to be carried out, which has allowed a rather 
complete understanding of these stochastic models to be achieved. One of the main 
features of these models is that demographic stochasticity tends to excite the natural 
dynamics of the system leading to large scale coherent oscillations, termed stochastic 
amplification [8]. It has been shown how the frequency and amplitude of these 
oscillations depend on the parameters of the system and can provide a parsimonious 
explanation for the range of frequencies observed in real data [3|[9HTT]. In all of these 
studies though, the population is assumed to be very large so low numbers of infectives, 
or fade-out of the disease, is unlikely. 

For childhood diseases especially, where the infectious period is orders of magnitude 
smaller than the lifetime pL2j , the size of the epidemic outbreaks can be so large that the 
number of infectives can become small and fade-out of the disease is very probable, even 
in medium-size populations Because of this, some form of migration is commonly 
included in these models to re-introduce the disease after fade-out. This is a biologically 
realistic feature which is not needed in deterministic models where fade-out cannot 
happen. The presence of this fade-out boundary in the stochastic model means there 
is a natural asymmetry in the state space, which can have a considerable effect on the 
dynamics of the fluctuations in smaller-to-medium size systems. Figure [1] illustrates 
the dynamics of a stochastic epidemic model at three different population sizes. As the 
population is decreased the probability of a large outbreak, relative to the mean infection 
level, increases. These smaller systems are important as they are representative of the 
majority of towns and cities for which extensive case-report data exists [I3], but have 
so far received little theoretical attention. 

In this paper we calculate analytically — via a Wentzel-Kramers-Brillouin (WKB) 
approximation of the master equation — the outbreak distribution (marginal infective 
probability distribution) of a susceptible-infected-recovered (SIR) model. We show how 
the asymmetry in the system affects this distribution and in particular how it changes 
with population size. In the limit N ^ oo, the distribution tends to a Gaussian, as 
is expected and used in the system-size expansion of the master equation [H]. As 
the population is made smaller the distribution becomes skewed, rapidly acquiring fat- 
tails indicating an increased probability of rare, but large, outbreaks. All results are 
compared with stochastic simulations and, within the approximation's range of validity, 
the agreement is excellent. Although we have used the term analytic to refer to the 
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Figure 1. Infective time series illustrating how the dynamics of the model change 
as the total population, N, is decreased, (a) = 2 x 10^, the number of infectives 
oscillates about the mean, but fade-out does not happen, (b) = 3 x 10^, fade-out 
does occur but rarely, (c) N = 10^, here outbreaks are mainly triggered by immigration 
events and rapidly lead to fade-out. The model and parameters which generate these 
are described in Sections 2 and 3. 



method which we use, it should be understood by this that we use analytic techniques 
to obtain equations or expressions for functions making up probability distributions, 
but numerical techniques have to be used to solve these to obtain the final results. 

The WKB approximation has a long history in the theory of stochastic processes, 
and was the basis of Kramer's calculation of escape of a particle over a potential barrier 
due to thermal noise [15]. The method was originally used on Fokker-Planck equations 
with continuous variables x and consisted of making an ansatz for the probability of 
the form P(x) ~ exp [— iVS'(x)], where N was a large parameter such a the volume 
of the system or the inverse diffusion constant. The function S{x) turns out to obey 
a Hamilton- Jacobi equation [16], and and so leads to a definition of a Hamiltonian 
dynamics which describes the stochastic dynamics of rare events, such as large outbreaks. 

For the situation discussed in this paper we require a variant of the method applied 
to the master equation, where the variables are discrete rather than continuous [T714T9] . 
However the method is similar and the problem of describing strongly stochastic effects 
near the fade-out boundary is transformed into one of classical mechanics and of finding 
zero energy trajectories of a Hamiltonian. It should be stressed that this deterministic 
dynamics is not simply the N ^ oo dynamics of the original stochastic model, but an 
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auxiliary dynamics which is able to give a very good description of processes that cannot 
be described by the conventional deterministic dynamics found by taking N ^ oo. 

The method allows higher-order corrections to the leading exp [— iVS'(x)] to 
be calculated, the next-order correction being a prefactor -ft'(x) multiplying the 
exponential [201I2T] . There have been a number of applications of this method in the last 
few years, for example to calculate fixation and extinction times from quasi-stationary 
to absorbing states [22] - [25] . or transition times between metastable states [T9 | [26 | l27]. 
The WKB method is especially suitable for these type of problems as the aim is to only 
find one special trajectory. The work in this paper differs from these previous studies 
in that we are primarily concerned in studying the dynamics of the model, and hence 
in calculating the full stationary distribution of the master equation, not fixation times 
(there is no fixation in our model since immigration is included). Another distinguishing 
aspect of our work is that we also calculate the pre-factor term for the multi-dimensional 
model, which as will be shown, is crucial for accurate calculation of the probability 
density. This has not so far been computed for a model of this type. 

The rest of this paper is organised as follows: in Section |2] we introduce the theory 
behind the WKB approximation of the master equation and in Section [3] describe the 
numerical procedure that we adopt to apply it to calculate approximate distributions. 
In Section m we apply these methods to the SIR model with immigration. We end with 
a discussion of our results and outline a number of open questions. 



2. General formalism 



In this section we will discuss the WKB approximation to the master equation and 
its application to the SIR model. Although much of the formalism we require appears 
in various places in the literature it is somewhat scattered, so we first give a coherent 
summary based on some of the clearer discussions that have been published p^142T] . 

The starting point is the master equation for the time-evolution of Pn{t), which is 
the probability of finding the system in state n at time t: 

= Y: [Tr{n - r)P_(t) - Tr(n)P„(t)] . (1) 

Here r labels the transitions by giving the size of the jump. Both n and r are vectors 
of integers; for the SIR model they are two-dimensional with n = (n, m), where n and 
m are the number of susceptible and infected individuals respectively. Setting the left- 
hand side of Equation ([T]) to zero we find the time-independent equation specifying the 
stationary distribution, 

[Tr(n - r)P„„r - rr(n)P„] = . (2) 

r 

We now expand the transition rates in terms of the system size, S> 1. In our case 
this is the number of individuals in the system. Specifically we write 

Tr(n) = NWrin) + Ur{n) + 0{1/N) , (3) 
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where Vrr(n) and f/r(n) are of order unity. For the SIR Model, the transition rates are 
such that f/r(n) and the higher-order corrections are zero, and so Tj. = NW^. 

Finally we introduce the fraction of individuals who fall into the different classes: 
X = n/N; for the SIR model we will write x = (x, y) where x = n/N and y = m/N. So 
letting PVr(n) = Wr{Nx.) = Wr(x) and also Pn = -P/Vx = 7r(x), the master equation ([2]) 
now reads 



E 



^V(x - ^)7r(x - ^) - t/^r(x)7r(x) 



0. (4) 



2.1. The WKB approximation 



We can now apply the WKB approximation to 7r(x) by expressing it as 



7r(x) = K(x) exp(-iV5(x)) 



(8) 



where both S'(x) and -ft'(x) are assumed to be of order unity and ^ 1. Next we 
expand ^(x — {r/N)) to second order: 

- ^) = - • V.5(x) + ^ (r • V.)^ S(x) + ^ (^) , (6) 

and K(x) to first order: ir(x - r/N) = K{x.) - r ■ Vxi^(x)/A^ + 0{1/N^). Substituting 
into the master equation and comparing terms, at leading order we find a Hamilton- 
Jacobi equation, corresponding to the Hamiltonian 

if(x, p) = 5]^i;r(x)[exp(r ■ p) - 1] = 0. (7) 

r 

Here p = Vx5'(x). The prefactor K{x) is determined by the next-order contributions 
in A^~^ and is discussed in Section 12.31 It should not be too surprising that a 
Hamilton- Jacobi equation is found using the form ([5]): Markovian stochastic processes 
are equivalent to quantum mechanics in imaginary time [28j and we are interested in 
the limit where (h) goes to zero. 

Hamilton's equations are found from Eq. (j7]) to be: 

X = VpH = '^r Wr(x) exp (r • p) , 

r 

p = - V^H = - ^[exp (r ■ p) - 1] VxWr(x) , (8) 

r 

where the dot denotes differentiation with respect to time. From the solution of these 
equations we will find trajectories, called the fluctuational trajectories, x/(t) and the 
"momenta" along these trajectories P/(t). For the zero-energy solutions that we are 
interested in, the action along a given fluctuational trajectory is given by 

Sf = fpf X/ dt' . (9) 

J to 

From Eq. (IHD we see that p/ = is always one solution, with 

± = Y.rw,{^). (10) 
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This is called the relaxation trajectory for the following reason. If we multiply the 
original master equation ([1]) by n, sum over all n, and then shift the first sum by r, 
we find that d{n)/dt = J2r^Tr{{n)), in the limit — oo. Introducing x and tOr(x) 
as above, gives exactly Eq. (fTOj) . Therefore Eq. ( ITOj) describes the actual deterministic 
dynamics of the system which would occur in the limit N ^ oo. This is not the 
trajectory of primary interest to us in this paper; instead our focus will be on solutions 
of Eq. ([8]) with pj 7^ 0, which correspond to non-allowed trajectories in the iV — )■ 00 
limit. Before we discuss these, we will apply this formalism to the SIR model. 

2.2. The SIR model with immigration 

The SIR model consists of three classes of individuals, susceptibles denoted by 5*, 
infected denoted by / and recovered denoted by R [12j. We assume that births and 
deaths are coupled at the individual level, so that when an individual dies another 
(susceptible) individual is born [8J . This means that the number of individuals, A^, does 
not change with time, and so the number of recovered individuals is not an independent 
variable: nji = N — n — m. Immigration is included with a commuter formalism where 
susceptibles are assumed to be in contact with a constant pool of infectives outside the 
main community [ZIIH]. 

The transitions in the model are then of the following type: 

(i) Infection: ^ + / ^ / + / and 5 ^ /. 

/3— m + rjnj . 

(ii) Recovery: / R. 

T{n,m — l\n,m) = 7m. 

(iii) Death of an infected individual: / — ^ 5*. 

T{n + l,m — l\n,m) = fim. 

(iv) Death of a recovered individual: R — ^ S. 

T{n + l,m\n, m) = fi{N — n — m). 

Expressing this model in terms of the quantities introduced at the start of this 
section, we find that, as already mentioned, all corrections to NWr{n) in Eq. (|3]) are 
zero, and that Wr(x) is given by 

(i) Infection: Wr(x) = f3xy + rjx, r = (—1, +1) 

(ii) Recovery: Wr(x) = •yy, r = (0, —1) 

(iii) Death of an infected individual: Wr(x) = fiy, r = (+1, —1) 

(iv) Death of a recovered individual: Wr(x) = — x — y), r = (+1, 0) 
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Substituting these rates into ([7]) we find the Hamiltonian 

i/(x, p) = {(3xy + r]x)[e'P^+Py - 1] + -fyle'Py - 1] 

+ ijy[e-Py+P^ - 1] + - a; - y)[eP'^ - 1] . (11) 

Related Hamiltonians have been used by previous authors [22l[24]. The relaxation 
trajectory can be found by setting p = in the Hamilton equations found from Equation 
f|TT]) . or by simply substituting the Wr(x) for the SIR model into Equation (fTO|) . Either 
way this gives the deterministic SIR equations: 

X = — f3xy — rjx + /i(l — x), 

y = [ixy + Tfx - + ii)y . (12) 

From Equation f|T2l) we see that the fixed points (denoted by an asterisk) satisfy 
y* = ^{1 — x*)/(7 + fi). Eliminating y* shows that x* may be found by solving 

(7 + /i) 



„*\2 



[X 



fi (3 (3 



/3 



0. 



(13) 



This equation has two fixed points: a 'trivial' one which vanishes if = and a 
non-trivial (endemic) one. This is the one which will be of interest to us in the 
rest of the paper. The focus will be on the fluctuational trajectories which after a 
perturbation to this fixed point at t = 0, rapidly move away and end near the extinction 
boundary 



2.3. Calculation of the Pref actor K{x) 

An equation for the prefactor -ft'(x) is found from examining the second-order 0{N~^) 
terms in the expansion of the master equation. From this we find the equation 



exp(r ■ p) 



2g ^ r-VxWr 



0. 



(14) 



K 2 Wr 

If ?7r 7^ in the expansion of the transition rates ([3]), then there are extra terms in 
this equation [27]. Using Hamilton's equations (|8]), and their derivatives, we may write 
Equation in the alternative form 



K ^ dpi dxi 2 dxidxj dpidpj ^ 



.J . dpidxi 
Since K has no explicit time dependence, we have that 



0, 



dK 
~~dt 



E 



dK 



Xi 



E 



dH OK 



dxi ^ dpi dxi 
Using this result in Eq. f|T5|) we find a differential equation for K, 



dK 
~dt 



E 



dpidxi 2 ^ dxidxj dpidpj 



K. 



(15) 



(16) 



(17) 



To find K{x.) from this equation, we need to know the Hessian, Zij = d'^S/dxidxj. 
It is important to realise that we are only interested in quantities which vary along 
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the trajectories in phase space found by solving Hamilton's equations, so the momenta 
are now functions of x. Differentiating H = twice with respect to x taking care to 
distinguish between the cases where x and p are independent and when they are not, 
one finds the following equation for the matrix Z [20l[2T|[29l[30] : 

0. (18) 



% + ZikZji + Zji + - — —Zii + 



This can be written matrix equation 

Z + ZHppZ + Hy^pZ + ZHpy^ + ifxx = , 
where H^h = d^dhH ■, with a and b being x or p. 



(19) 



2.4- Gaussian asymptote 

To actually solve Hamilton's equations and thus calculate S'(x) and -ft'(x) we need to 
know boundary conditions. The fact that the stationary distribution is approximately 
Gaussian near the fixed point, x*, can provide these [14l[l9]. From Equation ( fTTll we 
see that the differential equation which determines K is linear, and so although the 
dependence on x can be found from this equation, K will only be known up to an 
overall constant. 

From Eq. ([9]) we see that 5'(x*) = and so the WKB approximation ([5j) can 
equivalently be written as 



7r(x) = /^(x) exp {-N [5(x) - 5(x*)]) 



1 



Oi—) 



Expanding about the fixed point for large N to Gaussian order gives 



TT X 



K(x*) exp 



N 



Z*(x 



(20) 



(21) 



The value of K{^*) now follows by asking that the probability distribution is normalised. 
Carrying out the Gaussian integrals one finds that 



2N 

K{^*) = — detZ* . 

TT 



(22) 



The matrix Z at the fixed point, x = x* and p = 0, can be found from Eq. (1191) . Since 
Z = and 



dxidpj 



E 



dwJyi) 



dpidpj 



p=0, x=x* 



p=0, x=x* 



p=0, x=x* 



dxi 



A 



Ywri^lriVj 



0, 



(23) 



we have from Eq. ([T9]) that Z*BZ* + A^Z* + Z*A = 0, or 

B + EA^ + AE = 0, (24) 

where H is the covariance matrix and Z* = S^^. It should be noted that the Gaussian 
result f l2ip is what would be obtained from the system-size expansion to first order 
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3. Solution procedure 



In this section we describe how we calculate the stationary probability density from the 
WKB approximation. As we are primarily concerned with how the form of the stationary 
distribution changes with A^, we fix the basic parameters as: f3 = 1.3, 7 = 1/13, 
/i = 5.5 X 10^^ and 77 = 10"^. These correspond to the childhood disease measles and 
lead to highly oscillatory stochastic dynamics (see Figured]). The effect of changing 
these parameters is discussed later in Section El 

The procedure starts with the Hamiltonian for the SIR model ffTTl) . The problem 
is to find the fiuctuational solutions of Hamilton's equations ([H])- As already discussed, 
there is a single stable fixed point at x = x*, p = 0, and the non-trivial fiuctuational 
trajectories that we seek have p 7^ 0. We can use the results of Section [23] to derive a 
consistent set of initial conditions to equations (]H]) by noticing that 5'(x) is quadratic 
in X — x* near the fixed point [19j. If we define a small perturbation, 6xj, to the fixed 
point, then the corresponding initial perturbations to the momentum along the required 
trajectory are given by 



5pi 



(25) 



P=0, x=x* j j 

The covariance matrix S may be found from Equation ( I24p . since A and B can be found 
from Equation fl23l) . An example of a fiuctuational trajectory found via this procedure 
is shown in Figure [2] The trajectory spirals away from the fixed point and ends when it 
leaves the positive quadrant of the x-y plane. Some care has to be taken when specifying 
the initial perturbations; they must be small enough that Equation (!25l) applies, but not 
so small that numerical errors can build up along the trajectory, due to the exponential 
slow down near the fixed point. For the same reason there is a practical limit to the 
range over which trajectories, and hence 7r(x) can be calculated with this method. 
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Figure 2. An example of a fiuctuational trajectory emanating from the fixed point 
X = X*, p = 0, projected into the (a) x-y plane, and (b) Px-Py plane. The line x = 0.06 
is marked by the red dashed line. 

With the initial conditions, plus ^(x*) = 0, K{x*) = 1 and Z(x*) = S^^, we can 
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simply numerically integrate forward the equations ([H]) , (E]) , dlZl) and (ITSi) simultaneously 
to find a trajectory as well as >S'(x) and K{x.) along it. We have taken K{x*) = 1, rather 
than the value given by Equation f l22l) for convenience. The final results are always 
normalised numerically, so the choice we make here is not important. By changing the 
initial perturbation, Sxj, different trajectories can be found. As the functions S and K, 
and hence vr, are only given along the trajectory, we need a method to interpolate these 
into a distribution over the x-y plane. 

In principle one can use a shooting method to find a trajectory which goes through 
any given point in the phase space, and hence the probability at that point. In practice, 
because of the highly oscillatory nature of the solutions, it is easier to calculate cross- 
sections of the distribution by calculating where a large set of trajectories intersect 
a given line in the x-y plane. The set of trajectories are computed by using initial 
conditions which lie on a small circle surrounding the fixed point \i9\. In this way a 
discretised estimate of 7r(x) can be computed. Figure [3^ shows the functions 5'(x) and 
K{x.) evaluated where trajectories intersect the line x = 0.06. The probability density 
along this cross section is then given by = 7r(x) = i^(x) exp[— A^S'(x)], shown in 
Figure [3b. Simulation results are not shown in Figure Ej but are in excellent agreement 
with the analytic results. Although there is no reason to suppose that these functions 
have simple analytic forms, one could ask if a simple analytic expression can be fitted 
in the asymptotic regime y ^ y*. We have observed that in this regime good fits to 
the functions shown in Figure |3^ are K{y) ~ a/{b + y) and Sijj) ~ {cy + d), for some 
constants a, 6, c and d. There is, however, no analytic justification for these. 

For comparison. Figure ^ shows vr with and without the pre-factor K. Clearly for 
smaller systems, this contribution from the next-to-leading order terms in the expansion 
cannot be ignored when finding an accurate estimate of vr. More generally, we wish 
to calculate the full marginal infective probability density, = X^n Pn,m- This is 
achieved by simply calculating the probability density along a range of cross-sections 
and then summing over them. The resulting marginal density is then normalised such 
that Pm = 1, as it ought to be. 

4. Outbreak distributions 

In this section we present results which show how the outbreak distribution changes 
with population size, N . The mean incidence varies linearly according to m* = Ny*. 
As the mean is decreased, three dynamical regimes can be distinguished, classified by 
population size, which coincide roughly with type I, II and III dynamics originally 
suggested by Bartlett |4]. Type I systems are large; they are those in which fade-out 
does not occur on a practical timescale, thus the disease is endemic. For the parameters 
used in this paper the lower limit on this class is iV ~ 8 x 10^. For type II, medium size 
systems, fade-out can occur but reintroduction is swift and the dynamics are still mainly 
endemic. Type III are small systems which cannot support endemic infection levels. In 
these systems outbreaks are triggered by immigration events, which then rapidly lead to 
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Figure 3. (a) The functions K and 5 evaluated along the line x — 0.06 (see Figure 
Hi), (b) The probability density, 7r(y) = K{y) exp[-A^S'(y)], evaluated for = 5 x 10^, 
2 X 10^, 10^ and 5 x 10^ (solid, dashed, dotted and dot-dashed hues respectively), (c) 
Comparison of 7r(y) calculated with (solid blue curve) and without (red dashed curve) 
the pre-factor K{y); N — 5 x 10~^. The 7r(?/) are left un-normalised in this figure for 
clarity. 

fade-out. Three time series representative of these regimes are shown in Figured] The 
marginal infective distribution, P^, then describes the distribution of outbreak sizes. 
We will primarily be interested in how this changes with N, for type I and II systems. 

4.1. Type I systems 

Figure H] shows the outbreak distributions, calculated via the WKB approximation 
and compared to simulations, for three large populations in which the probability of 
extinction is negligible. For all three the agreement between the result from the WKB 
calculation and the stochastic simulations is excellent. As is decreased so is the mean 
(indicated by the arrows) and the distribution becomes more asymmetric as the fade-out 
boundary starts to exert a bigger influence on the dynamics. These distributions are 
also compared with the Gaussian result given by Equation ( |2T]) (red dashed lines). It 
can be seen that this overestimates the probability of the number of infected becoming 
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Figure 4. Marginal infective probability densities for larger systems. Population sizes 
are (a) TV = 5 x 10'^, (h) N = 2x 10^ and (c) iV = 8 x 10^. Solid green lines are from 
WKB calculations, the red dashed lines show the Gaussian approximation, and the 
noisy blue lines are from stochastic simulations. The arrows mark the mean values. 

small, while severely underestimating the probability of very large outbreaks. The WKB 
approximation remains valid down to m ~ but diverges close to the boundary as 

expected [27] . 

The fluctuations when m is small (usually after an outbreak) determine the size of 
the next outbreak. If a fluctuation takes the system closer to the boundary then the 
probability that the next outbreak will be large is greatly increased. This is because 
if m remains small, a large pool of susceptibles can build up. This effect is illustrated 
in Figure |5l which shows how the outbreak distribution changes with initial conditions 
close to and further away from the boundary. Figure Eb also shows that larger outbreaks 
have a naturally longer period. This feature accounts for the observed changes in the 
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power spectrum of fluctuations in smaller systems, which are systematically shifted to 
lower frequencies, as compared with the — ?■ oo predictions 
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Figure 5. The effect of tlie boundary on outbreak size. Parts (a) and (b) show 
realisations of the stochastic model started from xq = {Nx*, 200) and Xq — {Nx* , 60) 
respectively, with N — 8 x 10^. Dynamics then proceed in an anti-clockwise fashion. 
The reahsations that fluctuate nearer to the boundary tend to move further along the 
n-axis leading to larger outbreaks. Part (c) shows the outbreak distributions — defined 
as when the realisation first crosses the dashed lines — for the two initial conditions. 



4-2. Type II and III systems 

For the population sizes considered in the previous section, fade-out for any length of 
time is very rare, but even so, the proximity of the boundary and the potentially low 
levels of infectious individuals can have a large effect on the dynamics. We now study 
two smaller systems, and show how fade-outs of the disease affect the dynamics, and 
where the WKB approximation starts to break down. The outbreak distribution for 
= 5 X 10^ is shown in Figure [6l The inset shows the asymptote of the distribution 
near the fixed point and there is clearly a non-zero probability of being on the boundary. 
In spite of this, the WKB result is still in very good agreement with the simulation result. 

For smaller populations this agreement starts to break down. This can be seen in 
Figure [7] which shows the outbreak distribution for N = 3 x 10^. Although the WKB 
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Figure 6. Outbreak distribution for = 5 x 10^. Solid green line is the WKB result, 
red dashed line is the Gaussian approximation and the blue dots are from simulations. 
Inset shows the asymptote around the fixed point; the probability of fade out is now 
clearly non-zero. The arrow marks the mean value. 
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Figure 7. Comparison of the full probability density (blue dots) and a single cross- 
section through the middle of the distribution (orange dots) for = 3 x 10^. Again 
the green line, red dashed line and arrow show the WKB, Gaussian and mean result 
respectively. At this system size the WKB approximation cannot capture the full 
distribution, but it does approximate well the central cross-section. The inset shows 
the asymptote of the cross-section about the fixed point, which shows that there is 
significant build up of probability on the boundary. 
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method provides a good approximation to a cross-section through the middle of the 
distribution, it cannot correctly predict the tails of the full marginal distribution, which 
is now strongly bimodal. This is because a large contribution to these tails comes from 
outbreaks which are started on the boundary. Thus to estimate the tails by the WKB 
method one would need to find trajectories which start on the boundary. This might in 
principle be possible — if we could generate initial conditions. 

For the larger system in Figure |6] there are also outbreaks which originate from the 
boundary but their contribution to the overall marginal density can still be calculated 
from trajectories started at the fixed point. This is because the system never moves 
too far along the boundary before an immigration event starts a new outbreak. For 
= 3 X 10^ this is no longer the case as the system can potentially move far along the 
boundary into a region where it is no longer possible to find trajectories via the normal 
methods. 

The dynamics of even smaller systems are dominated by fade-out at the boundary 
as shown in Figure [T}:;. In effect, almost all outbreaks are started from the boundary and 
are of such a large size that they almost always lead to fade-out. Susceptibles then build 
up through births until an immigration event triggers another outbreak. Therefore the 
timing of an outbreak is a random process, with the average outbreak size related to 
the time since the last outbreak (and hence the size of the pool of susceptibles). 

5. Discussion 

In this article we have analytically investigated the outbreak distribution of a stochastic 
SIR model with immigration. This is carried out by calculating both the leading 
order exponential, and next-to-leading order pre-factor in a WKB expansion of the 
master equation. The agreement of these analytic results with stochastic simulations 
is excellent. Although we cannot calculate the full marginal distribution for very small 
systems, where fade-out starts to dominate the dynamics, the WKB calculation is still 
remarkably accurate when calculating cross-sections of the bulk of the distribution. 

The ability of smaller systems — those where the number of infected individuals 
can become small — to generate relatively larger outbreaks is clear in simulations and 
is important for analysing real time series. Up until now this effect has received little 
theoretical attention. In the limit N ^ oo the distribution is Gaussian, a fact which is 
the basis of the system-size expansion [14j. But as is made smaller, the distribution 
becomes highly non-Gaussian as the fade-out boundary approaches the macroscopic 
fixed point. Although the asymptote of the distribution remains somewhat Gaussian in 
shape, the fat tails of the distribution indicate the presence of large outbreaks (relative 
to the mean infectious level). These tails cannot be captured within the Gaussian 
approximation. Clearly these results are relevant to a number of models of this type; 
the SIR model is just one example of a generalised predator-prey model [32] . 

For smaller systems, the next-to-leading order term makes a significant contribution 
to the calculation of the probability density (see Figure Most studies of two- 
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dimensional systems up until now have obtained approximate results for fixation 
times in various limits where Hamilton's equations can be simplified and solved 
analytically [23]425] . Although we have not been concerned with fixation in this paper, 
our work has implications for calculating the mean time to extinction using these 
methods. By definition, if a system is small enough that extinction can happen on 
an observable time scale, then there will be large corrections to the first-order result 
from the next-to-leading order terms in the WKB expansion. 

Throughout this paper we have fixed the basic parameters of the system 
corresponding to a disease like measles, which has highly oscillatory stochastic dynamics 
and is typical of many childhood diseases [12] . Our aim has been to illustrate the method 
which we use, giving enough detail so that others can use it for other parameter values 
or other models. The choice of parameters was therefore not chosen on theoretical 
grounds; in principle any set which showed similar effects could have been used. While 
this is true for the standard SIR parameters (3, 7 and /x, it is worthwhile stressing the 
role of the parameter rj, which is less standard. This parameter governs the rate of 
new infections brought back by susceptibles visiting outside populations. In the natural 
dynamics this can be thought of as a damping term because it is constantly depleting 
the pool of susceptibles, thus the larger rj is set, the smaller and less coherent the size 
of the stochastic oscillations for a given population size [8]. Such a system will show 
the same overall changes in the form of the outbreak distribution, but the onset will be 
later at comparatively smaller system sizes. 

One of the factors we have ignored in this paper is the influence of seasonal forcing 
on the epidemic dynamics. This is known to be very important for childhood diseases 
and arises due to the aggregation of children in schools during term time. To include this, 
the transmission term /3 is made a function of time with a period of one year. The forcing 
then induces a limit cycle in the macroscopic dynamics with stochastic amplification 
of the transients about this [H]. One of the most interesting and biologically relevant 
features of these forced models is the existence of multiple stable attractors; for example 
the coexistence of 1 and 2- year cycles [331IM] . The WKB methods outlined in this article 
should be suitable for analysing the rate of switching between these attractors and for 
showing how this changes with system size [35]. A quantitative stochastic theory of 
this phenomenon would be a valuable addition to our overall understanding of these 
epidemic systems. 
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